CSV vs XLSX Formats and Importing Data in R

Mastering data import in R

Masumbuko Semba

2024-02-16

Learning Agenda

  1. Get familiar with R and Rstudio
  2. Data structure and data types
  3. Reading and writing data in Rstudio
  4. Tidying and Data manipulation with tidyverse
  5. Plotting and Visualization
  6. Descriptive Statistics
  7. Inferential Statistics
  8. Modelling and simulation
  9. Spatial Handling and Analysis

csv Format

  • Definition: CSV (Comma-Separated Values)
  • Structure: Plain text file with data separated by commas
  • Advantages: Lightweight, easy to create and read, widely supported

Note

Disadvantages: Limited support for formatting, no support for multiple sheets

csv format …

  • The csv format is loaded with readr_csv() function
  • The function is from readr package
  • To access this package we need to install tidyverse package
  • An ecosystem of packages including readr
# Install and load tidyverse package
install.packages("tidyverse")

Warning

Loading packages in R is necessary to use their functions and features.

xlsx format

  • Definition: XLSX (Excel Workbook)
  • Structure: Binary file format with data, styles, and multiple sheets
  • Advantages: Rich formatting, supports multiple sheets, formulas, charts

Warning

Disadvantages: Larger file size, more complex than CSV

xlsx format …

  • The xlsx format is loaded with readr_excel() function
  • The function is from readxl package
  • To access this package we need to install readxl package
  • An ecosystem of packages including readxl
# Install and load readxl package
install.packages("readxl")

Warning

When you install a package, it is not automatically loaded into your R environment

csv format …

  • load the package in R to use its functions
  • you load the package with require function
  • alternatively, you use library function
# load the package
require(tidyverse)

Warning

When you install a package, it is not automatically loaded into your R environment